AITopics | adaptive gradient method

Collaborating Authors

adaptive gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Neural Information Processing SystemsApr-25-2026, 19:12:02 GMT

Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using some specific adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrate the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our SUPER-ADAM algorithm can achieve the best known gradient (i.e., stochastic first-order oracle (SFO)) complexity of O( 3) for finding an -stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Appendix: On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them

Neural Information Processing SystemsApr-24-2026, 06:50:14 GMT

Suppose we have a non-zero solution θ which is a stationary point of f(θ,t) at t-th step and SGD finds θt = θ at t-th step. Theorem 2.2 of Shapiro and Wardi [9] told us that the learning rate should be small enough for convergence. Obviously, we have η < in practice. As ηt = ηt+1 does not hold, SGD cannot converging to any non-zero stationary point. The proof is now complete.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

040d3b6af368bf71f952c18da5713b48-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:50:11 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

84d286e32bbee8fa3a86ee9c50e00081-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 07:27:09 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

SUPER-ADAM: FasterandUniversalFrameworkof AdaptiveGradients

Neural Information Processing SystemsFeb-8-2026, 13:34:36 GMT

Although multiple adaptivegradient methods were recently studied, theymainly focus oneither empirical ortheoretical aspects and also only work for specific problems by using some specific adaptive learning rates.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SUPER-ADAM: FasterandUniversalFrameworkof AdaptiveGradients

Neural Information Processing SystemsFeb-8-2026, 13:34:32 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

040d3b6af368bf71f952c18da5713b48-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:55:30 GMT

adams, adaptive gradient method, weight decay, (12 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

040d3b6af368bf71f952c18da5713b48-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:55:27 GMT

gradient norm, scheduler, weight decay, (14 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

Neural Information Processing SystemsDec-25-2025, 04:41:24 GMT

We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for $t$-th update, denominator uses information up to step $t-1$, while numerator uses gradient at $t$-th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition. We demonstrate that ACProp has a convergence rate of $O(\frac{1}{\sqrt{T}})$ for the stochastic non-convex case, which matches the oracle rate and outperforms the $O(\frac{logT}{\sqrt{T}})$ rate of RMSProp and Adam. We validate ACProp in extensive empirical studies: ACProp outperforms both SGD and other adaptive optimizers in image classification with CNN, and outperforms well-tuned adaptive optimizers in the training of various GAN models, reinforcement learning and transformers. To sum up, ACProp has good theoretical properties including weak convergence condition and optimal convergence rate, and strong empirical performance including good generalization like SGD and training stability like Adam.

acprop, momentum centering and asynchronous update, optimizer, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Filters

Collaborating Authors

adaptive gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4be5a36cbaca8ab9d2066debfe4e65c1-Paper.pdf

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Appendix: On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them

040d3b6af368bf71f952c18da5713b48-Paper-Conference.pdf

84d286e32bbee8fa3a86ee9c50e00081-Paper-Conference.pdf

SUPER-ADAM: FasterandUniversalFrameworkof AdaptiveGradients

SUPER-ADAM: FasterandUniversalFrameworkof AdaptiveGradients

040d3b6af368bf71f952c18da5713b48-Supplemental-Conference.pdf

040d3b6af368bf71f952c18da5713b48-Paper-Conference.pdf

Momentum Centering and Asynchronous Update for Adaptive Gradient Methods